Monolingual Document Retrieval: English versus other European Languages
نویسندگان
چکیده
The vast majority of research in information retrieval is done using English collections and topics. This raises questions about the effectiveness of retrieval strategies for other languages. To examine this issue, we focus on document retrieval in nine European languages. In particular, we investigate the effectiveness of language-dependent approaches to document retrieval, such as stemming and decompounding; of language-independent approaches, such as character n-gramming; and of the combination of the two types of approaches. The experimental evidence is obtained using the 2003 test-suite of the cross-language evaluation forum (CLEF).
منابع مشابه
Dublin City University at CLEF 2004: Experiments in Monolingual, Bilingual and Multilingual Retrieval
The Dublin City University group participated in the monolingual, bilingual and multilingual retrieval tasks this year. The main focus of our investigation this year was extending our retrieval system to document languages other than English, and completing the multilingual task comprising four languages: English, French, Russian and Finnish. Results from our French monolingual experiments indi...
متن کاملCross-Language Spoken Document Retrieval on the TREC SDR Collection
This paper presents preliminary experiments on crosslanguage spoken document retrieval (SDR) carried out on a benchmark assembled at ITC-irst. The benchmark is based on resources used in the last two spoken document retrieval tracks at the TREC conference, which are available on the Internet. They include automatic transcripts of American English broadcast news, short topics written in English,...
متن کاملCross-Lingual Information Retrieval System for Indian Languages
This paper describes our first participation in the Indian language sub-task of the main Adhoc monolingual and bilingual track in CLEF competition. In this track, the task is to retrieve relevant documents from an English corpus in response to a query expressed in different Indian languages including Hindi, Tamil, Telugu, Bengali and Marathi. Groups participating in this track are required to s...
متن کاملPassage Retrieval vs. Document Retrieval in the Monolingual Task with the IR-n System
The paper describes our participation in monolingual tasks at CLEF 2006. We have submitted results for the following languages: English, French, Portuguese and Hungarian. We focused on studying different weighting schemes (okapi and dfr) and retrieval strategies (passage retrieval and document retrieval) to improve retrieval performance. After an analysis of our experiments and of the official ...
متن کاملMonolingual and Bilingual Experiments in GeoCLEF2006
This paper presents the results of our initial experiments in the monolingual English, Spanish and Portuguese tasks and the Bilingual Spanish → English, Spanish → Portuguese, English → Spanish and Portuguese → Spanish tasks. Twenty runs were submitted as official runs, thirteen for the monolingual task and seven for the bilingual task. We used the Terrier Information Retrieval Platform to run e...
متن کامل